Model Selection in Reinforcement Learning with General Function Approximations
نویسندگان
چکیده
We consider model selection for classic Reinforcement Learning (RL) environments – Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) under general function approximations. In the framework, we do not know classes, denoted by $$\mathcal {F}$$ {M}$$ , where true models reward generating MABs transition kernel MDPs lie, respectively. Instead, are given M nested (hypothesis) classes such that contained in at-least one class. this paper, propose analyze efficient algorithms MDPs, adapt to smallest class (among classes) containing underlying model. Under a separability assumption on hypothesis show cumulative regret of our adaptive match an oracle which knows correct (i.e., ) priori. Furthermore, both settings, cost is additive term having weak (logarithmic) dependence learning horizon T.
منابع مشابه
Abstraction Selection in Model-based Reinforcement Learning
ion Selection in Model-Based Reinforcement Learning Nan Jiang, Alex Kulesza, Satinder Singh {NANJIANG,KULESZA,BAVEJA}@UMICH.EDU Computer Science & Engineering, University of Michigan
متن کاملConvergence of Reinforcement Learning with General Function Approximators
A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridg...
متن کاملPAC-Bayesian Model Selection for Reinforcement Learning
This paper introduces the first set of PAC-Bayesian bounds for the batch reinforcement learning problem in finite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our...
متن کاملValue-Aware Loss Function for Model Learning in Reinforcement Learning
We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm tha...
متن کاملNonparametric General Reinforcement Learning
Reinforcement learning problems are often phrased in terms of Markov decision processes (MDPs). In this thesis we go beyond MDPs and consider reinforcement learning in environments that are non-Markovian, non-ergodic and only partially observable. Our focus is not on practical algorithms, but rather on the fundamental underlying problems: How do we balance exploration and exploitation? How do w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-26412-2_10